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1 Abstract 

The study of molecular evolution rests on the classical fields of population ge- 
netics and systematics, but the increasing availability of DNA sequence data has 
broadened the field in the last decades, leading to new theories and methodolo- 
gies. This includes parsimony and maximum likelihood methods of phylogenetic 
tree estimation, the theory of genome rearrangements, and the coalescent model 
with recombination. These all interact in the study of genome evolution, yet to 
date they have only been pursued in isolation. 

We present the first unified parsimony framework for the study of genome 
evolutionary histories that includes all of these aspects, proposing a graphical 
data structure called a history graph that is intended to form a practical basis for 
analysis. We define tractable upper and lower bound parsimony cost functions 
on history graphs that incorporate both substitutions and rearrangements. We 
demonstrate that these bounds become tight for a special unambiguous type of 
history graph called an ancestral variation graph (AVG), which captures in its 
combinatorial structure the operations required in an evolutionary history. 

For an input history graph G, we demonstrate that there exists a finite set of 
interpretations of G that contains all minimal (lacking extraneous elements) and 
most parsimonious AVG interpretations of G. We define a partial order over this 
set and an associated set of sampling moves that can be used to explore these 
DNA histories. These results generalise and conceptually simplify the problem 



so that we can sample evolutionary histories using parsimony cost functions that 
account for all substitutions and rearrangements in the presence of duplications. 



2 Introduction 

In genome evolution there are two interacting relationships between nucleotides 
of DNA, resulting from two key features: DNA nucleotides descend from com- 
mon ancestral nucleotides, and they are covalently linked to other nucleotides. 
In this paper we explore the combination of these two relationships in a simple 
graph model, allowing for change by the process of replication, where a complete 
sequence of DNA is copied, by substitution, in which the chemical characteristics 
of a nucleotide are changed, and by the coordinated breaking and rematching 
of covalent bonds between nucleotides in rearrangement operations. 

This paper will develop a parsimony model for these processes of change, 
however, they each have quite different dynamics that lead to us accounting 
for their parsimony costs differently. As DNA molecules replicate essentially 
continuously this process has zero cost. Much more rarely substitutions occur 
and more rarely still rearrangement operations take place. By making the com- 
mon assumption that all substitutions and rearrangements occur independently 
of one another, we account for the cost of these latter two processes by inde- 
pendent rearrangement and substitution costs, which are themselves essentially 
sums over the numbers of inferred events. Importantly, replications that are 
combined with unbalanced rearrangements, which lead either to gain or loss 
of sequence, are costed in terms of the underlying rearrangement cost. By ac- 
counting for these costs independently and allowing for arbitrary replication, 
we build upon a wealth of models, data structures and algorithms that have 
studied these processes either in isolation or in a more limited combination. 

Such evolutionary methods generally start with a set of observed sequences 
in an alignment, an alignment being a partitioning of elements in the sequences 
into equivalence classes, each of which represents a set of elements that share a 
recognizably recent common ancestor. Though alignments represent an uncer- 
tain inference, and though their optimisation for standard models is intractable 
for multiple sequences (Elias 2006| ), we make the common assumption that the 



alignment is given, as efficient heuristics exist to compute reasonable genome 
alignments ( [Miller et al] |2007| , [Darling et al.| |2010| , [Paten et al.| [201 lb| ). 



If the sequences in an alignment only differ from one another by substitutions 
and rearrangements that delete subsequences, or insert novel subsequences (col- 
lectively indels) , then the alignment data structure is naturally a 2D matrix. In 
such a matrix, by convention, the rows represent the sequences and the columns 
represent the equivalence classes of elements. The sequences are interspersed 
with "gap" symbols to indicate where elements are missing from a column due 
to indels. From such a matrix alignment, phylogenetic methods infer a history 
of replication (Felsenstein 2004| ). Such a history is represent able as a phylo- 



genetic tree, whose internal nodes represent the most recent common ancestors 
(MRCA) of subsets of the input sequences. To create a history including the 
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MRCA sequences, additional rows can be added to the matrix ( Blanchette et al. 
Kim and Sinha 2007 , Paten et al. 2008| ). For parsimony models, both 



2004 



imputing phylogenetic trees from matrix alignments and calculating MRCA se- 



1987 , Chindelevitch et al. 



quences given a phylogenetic tree and a matrix alignment are NP-complete ( Day 

2006] ) 



The alignment of long DNA sequences related by substitutions and homol- 
ogous recombination rearrangements is also representable as a matrix; homolo- 
gous recombination operations being the primary modifier of individual genomes 
within a population. However, the history of replication of such an alignment is 
no longer generally representable as a single phylogenetic tree, as each column in 
the matrix may have its own distinct tree. To represent the MRCAs of such an 
alignment requires a more complex data structure, termed an ancestral recom- 
bination graph (ARG) (Song and Hein 2005| , Westesson and Holmes 2009| ). 



It is NP-hard under the infinite sites model (no homoplasy) to determine the 
minimum number of homologous recombinations needed to explain the evolu- 
tionary history of a given set of sequences, and probably NP-hard under more 
general models (Wang et al. 2001| ). 

Larger DNA sequences, or complete genomes, are generally permuted by 
more complex rearrangements, such that the matrix alignment representation 
is insufficient. Instead, the alignment is naturally a form of graph, called a 
breakpoint graph (see Section [s] for a formal introduction). Using such graphs, 
for pairs of genomes and when rearrangements are assumed balanced, infer- 
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ring rearrangement histories based upon inversions (Hannenhalli and Pevzner 
translocations ( Bergeron et al.| 2006| ) or double-cut-and-join (DC J) op 



erations (Yancopoulos et al. 2005| ) has polynomial or better time complexity. 
However, for three or more genomes with balanced rearrangements or when re- 
arrangements are unbalanced these exact parsimony methods are intractable, 
and heuristics become necessary ([Bourque and Pevzner| |2002| , |Ma et al.|| 2008| ). 



Notably, there have been several recent methods to extend these models to han- 



dle limited forms of unbalanced rearrangements (Yancopoulos and Friedberg 



2009 Bader 2010 , Braga et al. 2011 ) 



The graph model introduced in this paper is capable of representing a gen- 
eral evolutionary history for any combination of replication, substitution and 
rearrangement operations, including homologous recombinations. It therefore 
generalises phylogenetic trees, graphs representing histories with indels, ances- 
tral recombination graphs and breakpoint graphs. We start by introducing this 
graph and then develop a parsimony model that, somewhat imperfectly, gener- 
alises parsimony variants of all the problems mentioned, facilitating the study of 
all these subproblems in one unified domain. We provide a sampling approach 
to cope with the NP-hardness of the general parsimony problem. 



3 Sequence Graphs and Threads 

Sequence graphs are used extensively in comparative genomics, in rearrange- 



ment theory typically under the name (multi or master) breakpoint graph ( Alek- 



seyev and Pevzner 


2008 , 


Ma et al. 


A-bruijn ( 


Raphael et al. 


2004 ) or a 



2011a 



We 



or adjacency graph (Paten et al. 
use the following bidirected form, which is similar to that used by |Medvedev| 
and Brudno 20091 for sequence assembly. 



Definition 1. A (bidirected) sequence graph G — {Vg, Eq) is a graph in which 
a set Vg of vertices, termed segments, are connected by a set Eg of bidirected 
edges ( Edmonds and Johnson 11970 ) , termed bonds. A segment represents a 



subsequence of DNA. A segment is oriented, having a tail side and a head side. 
For segment x, a side is denoted Xa, where a € {head, tail}. These categories 
{head, tail} are called orientations. A bond, which represents the covalent bond 
between adjacent nucleotides of DNA, is a pair set of sides. We refer to the two 
sides contained in a bond as its endpoints. Bonds are bidirected, in that each 
endpoint is not just a vertex, but a vertex with an independent orientation 
(either head or tail). For convenience, we say a side is attached if it is contained 
in a bond, else it is unattached. We say a segment is attached if either of its 
sides are attached, else it is unattached. 

Associated with a sequence graph is a labeling function. 

Definition 2. The function ^ : — >■ S* U {0} is the labeling function where 
E = {A/T, C/G, G/C, T/A} is the alphabet of bases, which are oriented, paired 
nucleotides of DNA, and S* is a set of sequences of bases. For p/r g E*, p is 
the forward complement and t is the reverse complement. Labels are directed. 
Traversed from the tail to the head side of a segment a label is read as its 
forward complement, and reversely, traversed from the head to the tail side of a 
segment a label is read as its reverse complement. A segment x £ Vg for which 
l{x) =0 is unlabeled. 

In this paper we limit ourselves to the following form of sequence graph. 

Definition 3. A thread graph is a sequence graph in which each side is con- 
tained in at most one bond. 

An example thread graph is shown in Figure [l] 

Definition 4. A connected component in a thread graph is called a thread. 

A thread represents a single DNA sequence whose bases are encoded by the 
labels of the segments, where unlabeled segments represent missing information. 
A thread may be a simple cycle, representing a circular DNA molecule, or have 
two unattached sides, in which case it represents a linear DNA molecule or 
fragment of a larger DNA molecule. A thread graph is phased, in that each 
segment in it is part of one thread. In contrast, a sequence graph that is not 
a thread graph may be unphased, in that there exist many possible maximal 
thread subgraphs for each of its connected components. 
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GAGGGTGGCCCGAGAA ) ( tATTCAGAACCTTAAAGTA — [ AACCCCAGCACAAATTTT ) 

GAGGGTGGCCTGAGAA ) ( T ATTCAGAACCTTAAAGTA^ Ya^ ^ GACGGTGGCCCGAGAA 



GACGGTGGCCCGAGAA ) — A ) 1 TATTCAGAACCTAAAAGTA ) AACCCCAGCACCAATTTT 

Figure 1: A thread graph. For visual appeal, segments are the arrow shapes 
with the sides indicated by the ends of the arrows. Labels within the arrows 
represent the subsequence of DNA when traversed from the tail side to the head 
side of the arrow. Bonds are the lines connecting the ends of the arrow shapes. 
They are bidirected, i.e. there are 3 unordered types: head-tail (symmetrically 
tail-head), tail-tail and head-head bonds. In prior illustrations of bidirected 



graphs (Medvedev and Brudno 2009 ) orientations were drawn on the lines. 



however the semantics of the graph are still the same, in that head and tail 
orientations are properties of the endpoints of the bonds, not the segments. The 
graph contains three linear threads. As an example, because the middle segment 
is attached in the opposite direction and therefore reverse-complemented when 
traversed left-to-right, the top thread represents the sequence "GAGGGTG- 
GCCCGAGAATACTTTAAGGTTCTGAATAAACCCCAGCACAAATTTT" 
(from left-to-right, colour used to distinguish segment 
labels) and its reverse complement, "AAAATTTGT- 
GCTGGGGTTTATTCAGAACCTTAAAGTATTCTCGGGCCACCCTC" 
(from the right-to- left). The colours of the arrows represent homologies 
between the segments, these are not part of the thread graph itself. 
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4 History Graphs 



Nucleotides of DNA derive from one another by a process of replication. This 
replication process is represented in history graphs, which add ancestry rela- 
tionships to thread graphs. 

Definition 5. A history graph G = {Vg, Eq, Bq) is a thread graph with an 
additional set Bq of directed edges between segments, termed branches. Each 
segment is incident with at most one incoming branch. The event graph D{G) 
is the directed graph formed by the contractioi|^ of bonds in Eq- For G to be 
a history graph D[G) must be a directed acyclic graph (DAG), a property we 
term acyclicity. 

Example history graphs are shown in Figure [2j A, B), along with an event 
graph in[2jC) for the history graph shown in[2jB). We now define useful termi- 
nology to discuss branch relationships. 

Definition 6. Each connected component of branches forms a branch-tree. Two 
segments are homologous if they are in the same branch-tree. A segment y is a 
descendant of a segment x, and conversely y is an ancestor of x, if y is reachable 
by a directed path of branches from x. If two homologous segments do not have 
an ancestor/descendant relationship then they are indirectly related. 

For a branch e = {x,y), x is the parent of e and y, and y is the child of e 
and a child of x. Similarly, e is the parent branch of y and a child branch of x. 

A segment is a leaf if it has no incident outgoing branches, a root if it has 
no incident incoming branches, else it is internal. 

We reuse the terminology of parent, child, homologous, ancestor, descendant 
and indirectly related with sides. Two sides have a given relationship if their 
segments have the relationship and they have the same orientation. Similarly, 
a side is a leaf (resp. root) if its segment is a leaf (resp. root). 

5 Evolutionary Histories 

Now we formally define a notion of a history graph without significant missing 
information. 

Definition 7. An epoch is a history graph in which: 

• Every branch-tree is identical, composed of a root segment with n children. 

• Every segment is labelled. 

• A root segment has at most one child with a label different to its own. 

• A side is attached if any homologous side is attached. 

^The contraction of an edge e is the removal of e from the graph and merger of the vertices 
X and y incident with e to create new vertex z, such that edges incident with z were incident 
either with i: or y or both, in the latter case becoming a loop edge on z. 
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Figure 2: (A) A history graph representing homology relationships between the 
segments in Figure [T] Due to space, colours are used as labels (and match those 
in Figure [I]) , with unlabeled segments shaded grey. Two segments have the 
same colour shade if and only if they have identical labels. The dotted arrows 
represent branches. Four ancestral segments are added relative to Figure [l] to 
represent the common ancestral segments of the subsets of homologous segments 
in Figure [l] (B) An extension of (A). (C) The event graph for (B). (D) An 
evolutionary history with four epochs (1 - 4), and rearrangements given names 
corresponding to their type. It is a realization for the graphs in (A) and (B). 
(E) The module graph for the most recent epoch (nearest leaves) in (D). 
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• If the number of leaves in each branch-tree is greater than 1, then for each 
pair of leaf sides connected by a bond there exists a bond connecting their 
parents. 

Definition 8. An evolutionary history is a history graph that can be vertex- 
partitioned into a finite sequence of epochs, such that the leaf layer of an epoch 
is the root layer of the following epoch. 

An example evolutionary history with epoch subgraphs is shown in Figure 
[2]jD) . An evolutionary history can be thought of as a history graph from which 
a parsimonious sequence of substitutions and rearrangements can be trivially 
derived. Note that there is no general requirement in this model for replications 
to produce only two copies of a thread, though parsimonious histories will often 
involve them. 

Definition 9. The substitution cost of a branch in an evolutionary history is 
if the labels of its endpoints are identical, else it is 1. The substitution cost of an 
evolutionary history H is the sum of its branches' substitution costs, denoted 
s(H). 

The example evolutionary history in Figure [SJD) has substitution cost 4. 

The substitution cost defined is motivated by the case E* = E, i.e. single 
base labels, where the substitution cost is the minimum number of single base 
changes. As any history graph in which all homologous labels have the same 
length can easily be converted to a semantically equivalent history graph for 
which E* = E, we do not investigate more complex substitution costs. However, 
generalizations of the simple notion of substitution costs used here are relatively 
straightforward. 

Definition 10. The module graph of an epoch G is the graph resulting from 
the deletion of all labels in G, contraction of all branches in G, splitting of each 
segment into a separate vertex for each side, partitioning the bond incidences 
between the new vertices by the side to which they connect, and, finally, deletion 
of all isolated vertices. 

Figure [2JE) shows an example module graph. 

Definition 11. The Yancopoulos complexity of an epoch is E^f^] — 1, where 
ki is the number of sides in the ith connected component of the module graph 
of the epoch. The rearrangement cost of an evolutionary history H is the sum 
of its epochs' Yancopoulos complexities, denoted r(H.). 

The example evolutionary history in Figure [2]jD) has rearrangement cost 3. 

Lemma 1. The Yancopoulos complexity of an epoch is the minimum number of 
double- cut- and- join (DCJ) operations required to convert the root layer's bonds 
into the leaf layer's bonds. 



Proof. Similar to that given in 



Yancopoulos et al. 



2005 



□ 



Note that by definition of an epoch, the module graph that results from an 
epoch has degree at most 2, i.e. every vertex is incident on at most two edges. 
For epochs that contain root segments with multiple children (i.e. replications), 
the module graph has degree at most 1, because of the requirement for this 
case that for each pair of leaf sides connected by a bond there exist a bond 
connecting their parents. Hence the rearrangement cost is always for an epoch 
that contains a replication. 

Because different studies lay different emphases on substitution or rearrange- 
ment (for example because of the available data) and because the events do not 
have the same probability in practice, we allow for a degree of freedom in the 
definition of the overall cost function. 

Definition 12. An (evolutionary history) cost function for an evolutionary 
history is any monotone function on the substitution and rearrangement costs 
in which both substitutions and rearrangements have non-zero cost. 



6 Reduction 

Not all history graphs are as detailed as evolutionary histories. We define below 
a partial order relationship that describes how one graph can be a generalization 
of another graph, so for example, a less detailed history graph can be used to 
subsume multiple evolutionary histories. 

Definition 13. A branch whose child is unlabeled and unattached is referred 
to as having a free-child. A branch whose parent is unlabeled, unattached and 
a root with a single child is referred to as having a free-parent. A segment is 
isolated if it has no incident bonds or branches. 

Definition 14. A reduction operation is an operation upon a history graph 
that either: 

• Deletes a bond, an isolated segment or the label of a segment. 

• Contracts a branch with a free-child or free-parent. 

See Figure [sf^A-E) for examples. The inverse of a reduction operation is an 
extension operation. 



Lemma 2. The result of a reduction operation is itself a history graph. 

Proof. Easily verified. □ 

Definition 15. A history graph G is a reduction of another history graph G' 
if G is isomorphic to a graph that can be obtained from G" by a sequence of 
reduction operations, termed a reduction sequence. 

Lemma 3. The reduction relation is a partial order. 

Proof. Easily verified. □ 
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Figure 3: (A-E) Reduction operations. For each case the graph on the left is a 
reduction of the graph on the right. (A) A label deletion. (B) A bond deletion. 
(C) A segment deletion. (D) A contraction of a branch with a free-child. (E) 
A contraction of a branch with a free-parent. 



Definition 16. We write G =^ G" to indicate that G is a reduction of G' and 
G ^ G' to indicate that G is a reduction of G' not equal to G'. Like reduction 
and extension operations, if G is a reduction of H, H is an extension of G. 

An examination of the reduction relation is in the discussion section and 
Figure [TT] 



7 History Graph Cost 

Using the parsimony principle, we now extend parsimony cost functions, previ- 
ously defined on evolutionary histories, to all history graphs. 

Definition 17. An evolutionary history H that is an extension of a history 
graph G is called a realisation of G. The set 7i{G) is the realisations of G. 

Definition 18. For a given cost function c the cost of a history graph G i^ 
G(G,c)= min c(s(H), r(H)). 

He-H(G) 

Lemma 4. The problem of finding the cost of a history graph is NP-hard. 

Proof. Parsimony problems on either substitutions or rearrangements alone are 
NP-hard and can be formulated as special cases of the problem of finding the 
minimum cost realization of a history graph (Day 1987 , Tannier et al. 2009| ). 

□ 



8 The Lifted Graph 

Although determining the cost of a history graph is NP-hard, we will show 
that the cost can be bounded such that the bounds become tight for a broad, 

^Note: while 'H(G) is infinite we show in the sequel that the infimum of this set of costs is 
always achieved by a history, hence the infimum is the minimum. 
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characteristic subset of history graphs. To do this we introduce the concept of 
hfted labels and bonds. 



Definition 19. The free-roots of a history graph G are a set of additional 
segments such that a single, unique free-root is assigned to each root segment 



Definition 20. For a segment x, A(x) is the most recent labeled ancestor of 
X, else if no such segment exists, the free-root of the branch-tree containing x. 
For a side Xa, overloading notation, A{xa) is the most recent attached ancestor 
of Xa, else if no attached ancestor exists, the side x'^, where x' is the free-root 
of the branch-tree containing x. For a side or segment x, A(x) is the lifting 
ancestor of x. 

Definition 21. For a labeled segment y, a lifted label is a label for A{y) identical 
to l{y). For a segment x its lifted labels is therefore the multiset L'^ = {Lx,Nx), 
where L{x) is the set of distinct lifted labels for x, and for each lifted label p, 
Nj.{p) is the number of times p appears as a lifted label for x, i.e. = {l{y) ■ 



A{y) = x} C S* and : ^ Z+ such that N^{p) = \{y : A{y) = x,l{y) = 



Definition 22. For a bond {j/q, zp}, a lifted bond is a bidirected edge {A{ya), A{zj3 
For a side Xa its lifted bonds is the multiset (again overloading notation) L'^ = 
[Lx^tNx^), where L(xa) is the set of distinct lifted bonds incident with Xa, 
and for each lifted bond {xa,w^}, Nx^{{xa,w^}) is the number of sides whose 
lifting ancestor d which are connected by a bond to a side whose lift- 

ing ancestor is w^, i.e. L^^ = {{xa = ^(z/j)} : {ya,zp} e Eq} and 

= Lx^ Z+ such that Nx^{{xa,w^}) = \{ya ■ {xa = A{ya),Wj} G Lx^}\. 

Definition 23. A history graph G with free-roots, lifted labels and lifted bonds 
is a lifted graph L(G). 



Figure QA) shows an example lifted graph that outlines these concepts. 

Note that for a side Xa, Nx^{{xa,Wj}) gives the multiplicity of hfted bond 
incidences with Xa, not the multiplicity of {xa,Wj}, i.e. each lifted loop bond 
contributes two to the multiplicity of incidences with a side while each lifted 
non-loop bond contributes one to the multiplicity of incidences with a side. 

Definition 24. A junction side is a most recent common ancestor (MRCA) of 
two attached, indirectly related sides. 

Definition 25. A lifted label p of a labeled segment x is trivial if l{x) = p, 
else it is non-trivial. For a history graph G, a lifted bond e = {A(ya), A{zp)} 
is trivial if e G Eq and there exists no unattached junction side that is the 
ancestor of and descendant of A{ya), or the ancestor of zp and descendant 
of A{zp), else e is non-trivial. For a segment or side x the non-trivial lifted 
labels or bonds, respectively, are L'x = {Lx,Nx) ^ L'x, where Lx is the set of 
non-trivial lifted labels or bonds and is the multi-set of lifted labels and 
bonds that includes multiplicities. 



See Figure Er A) for examples of trivial and non-trivial labels and bonds. 




p}\- 
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Figure 4: (A) The hfted graph for the history graph in Figure [2jB). The blue 
and red hnes represent, respectively, trivial and non-trivial lifted bonds. Simi- 
larly, the blue and red stars represent, respectively, trivial and non-trivial lifted 
labels. The free-roots are shown as a set of segments above the other segments, 
with a grey line identifying their matching branch-tree. (B) The module graph 
for (A). Lower case letters are used to identify the sides. 
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9 Ancestral Variation Graphs 



We can now define a broad class of liistory graplis for whicli, we will demonstrate, 
cost can be computed in polynomial time. To do this we will define ambiguity, 
information that is missing but when added allows the tractable assessment of 
cost. There are two types of ambiguity. 

Definition 26. The substitution ambiguity of a history graph G is Us{G) = 
^^eV'L(G) "^''^(Oj ~ l)j the total number of non-trivial lifted labels in 
excess of one per segment. 

Substitution ambiguity reflects uncertainty about MRCA bases. The sub- 
stitution ambiguity of the history graph in Figure [2jB) is 1, as there exists one 
segment with two non-trivial lifted labels. 

Definition 27. The rearrangement ambiguity of a history graph G is Ur{G) = 
Sa;GVL G \^'xh^ad \ ~ ^) ~^ max{0, \L'^^^-, \ — 1), i-e. the total number of 

non-trivial lifted bonds in excess of one per side. 

Rearrangement ambiguity reflects uncertainty about MRCA bonds. The 
rearrangement ambiguity of the history graph in Figure [2jB) is 5. 

Definition 28. The ambiguity of a history graph G is u{G) = Us{G) + Ur{G). 
An ancestral variation graph (AVG) H is a history graph such that u(H) = 0, 
i.e. an unambiguous history graph. 

Lemma 5. Evolutionary histories are AVGs. 

Proof. Easily verified. □ 

While evolutionary histories are AVGs, so are many other history graphs that 
contain far less information, for example. Figure [5] shows an AVG extension of 
the history graph in Figure [2jB) that is not an evolutionary history. 



10 Bounds on Cost 

We provide trivially computable lower and upper bound cost functions for his- 
tory graphs that are tight for AVGs. 

Definition 29. For any pair of objects a and b, 5a,b = 1 if a = 6, else 0. 
The lower bound substitution cost (LBSC) of a history graph G is si{G) = 
^a:6V'L(G) "^'^^(Oj 1^2:1 ~^l{x).l!l)^ the total number of distinct nontrivial lifted 
labels at all segments, less one for each unlabeled segment with a lifted label 
(necessarily a free root). 

Definition 30. The upper bound substitution cost (UBSC) of a history graph 
G is Su{G) = J2xeVJ^^G) ( l-^^l ~ x max N^ip) j, i.e. the total number 
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4^ — ^"'C^^c^^' 



Q 



Figure 5: The lifted graph for an AVG with (simple) modules eontaining non- 
trivial lifted bonds highlighted, using the same notation as in Figure l4l A). 



of not necessarily distinct nontrivial lifted labels at all vertices, less the number 
of identical lifted labels of the most numerous type for each unlabeled segment 
(again, necessarily a free root). 

The LBSC of the history graph in Figure [2];B) is 4 and its UBSC is 5. For 
the AVG in Figure [§ LBSC = UBSC = 4. 

Definition 31. The module graph of a history graph G is a multi-graph in 
which the vertices are the sides of segments in L{G) that have incident bonds or 
lifted bonds, and the edges are the bonds and lifted bonds in L{G) incident with 
these sides. Each connected component in a module graph is called a module. 
The set of modules in the module graph for G is denoted M{G). 

Figure QB) shows the modules for Figure QA) . It is easily verified that for 
a history graph that is an epoch, this definition is consistent with our earlier 
definition. 

Definition 32. The lower hound rearrangement cost (LBRC) for a history 

graph G is r/(G) ~ Yl,MeM{G)i\^~^'\ ~ Yancopoulos cost of its 

module graph. 

Definition 33. The upper hound rearrangement cost (UBRC) of a history graph 



number of non-trivial lifted bonds in L{G) minus the number of modules in 
M{G) in which every side has exactly one incident non-trivial lifted edge. 



The LBRC of the history graph in Figure [2]^B) is 3 and its UBRC is 6. For 
the AVG in Figure (S) LBRC = UBRC = 3. 




G is r„(G) = E 



MeM{G) 




i.e. the total 
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Theorem 1. For any history graph G and any cost function c, c(s;(G), ri[G)) < 
C{G,c) < c{su{G),ru{G)) with equality if G is an AVG. 

Proof. See Supplementary Appendix A. □ 

Theorem [T] demonstrates that LBSC and LBRC are lower bounds on cost, 
UBSC and UCRC are upper bounds on cost, and that aU these bounds become 
tight at the point of zero ambiguity. This impHes that to assess cost of an 
arbitrary history graph G we need only search for extensions of G to the point 
that they have zero ambiguity, therefore avoiding the need to generate a larger 
number of potentially much larger evolutionary history extensions. For an AVG 
H, as the lower and upper bounds on cost are equivalent, we write r{H) = 
ri{H) = ru{H) and s(H) = si{H) = 

11 G-Optimal AVGs 

We now work towards sampling AVG extensions of any given history graph G 
in order to assess cost and explore the set of most parsimonious interpretations 
of G, firstly by defining the notion of G-optimal AVGs. In what follows, we 
assume G is a history graph that is not an AVG, else, by Theorem [T] it is trivial 
to assess its cost. 

Definition 34. For G ^ H, H is G-parsimonious w.r.t a cost function c if it is 
an AVG and G(G, c) = c{s{H),r{H)). 

The set of G-parsimonious AVGs is too big to explore directly because it is 
always infinite, i.e. any AVG can be extended in infinitely many ways by adding 
extraneous material without increasing substitution or rearrangement costs. To 
avoid the redundant sampling of AVG extensions of G and their own extensions 
we define the notion of minimality. 

Definition 35. For G ^ H, H is G -minimal if it is an AVG and there does 
not exist an AVG H' such that G ^H' ^ H. 

The set of G-minimal AVGs contains those AVGs that can not be reduced 
without either ceasing to be AVGs or extensions of G. 

Definition 36. An AVG is G-optimal w.r.t a cost function c if it is both G- 
parsimonious w.r.t to c and G-minimal. 

By definition, any G-parsimonious AVG is either G-minimal or has a G- 
minimal reduction, therefore we can implicitly represent and explore the set of 
most parsimonious interpretations of G by sampling G-optimal. Unfortunately, 
because the history graph cost problem is NP-hard, it is unlikely that there 
is an efhcicnt way to sample only G-optimal. Instead, we must construct a 
finite bounding set that contains G-optimal that can be efhciently searched. It 
is especially convenient if the same bounding set works for all monotone cost 
functions. We construct such a bounding set now. 
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12 G-Bounded History Graphs 



Definition 37. A label of a segment x is a junction if \L'^\ > 1, i.e. x has 
more than one lifted label, else it is a bridge if \L'^\ — 1, l{A{x)) = l{x) and 
La{x) 7^ {}; i-fi- X has one lifted label, its lifted label is non-trivial, the most 
recent labeled ancestor of x is labeled the same as x and this ancestor has at 
least one non-trivial lifted label (see Figure|6jA,B)). 

Definition 38. A bond {xaji/p} is a junction if either Xa or are junction 
sides, i.e. it is the MRCA of two attached, indirectly related sides, else it is 
a bmdge if |L^J < 1, \L'y^\ < 1, U L^^ ^ {}, {A(x„), A(y^)} is a trivial 

lifted bond, L'^f^^^^ U L^^^^^ {} and (L^^^^^ U L^J \ {{x^,?;^}} > 2 and/or 
^'^Myti} ^ ^'vf^ ^ {{^a^yp}} > 2, i.e. analogously with the definition of bridge 
for a label, with the added provision that its removal from the graph must leave 
the bond {A{xa),A{yp)} a junction (see Figure [6][C,D)). 

Definition 39. An element is non-minimal if it is a branch with a free-child 
or free-parent, an isolated segment, or label or bond that is not a junction or 
bridge. 

o 

(A) ^ (B) / \ (C)0 [ 

^ ^ ^ 



Figure 6: (A) A junction label. (B) A bridge label. (C) A junction bond. 
(D) A bridge bond. (E) An example of a pair of ping-pong bonds. The named 
elements are outlined in red. 



(D)/\ 



(E) 



Definition 40. For G ^ G', an element in G' is G -reducible if there exists a 
reduction operation in a reduction sequence from G' to G that either deletes 
the element if it is a bond, label or segment or contracts it if it is a branch. 

Definition 41. For G =^ G', the G-unbridged graph of G' is the reduction 
resulting from the deletion of all G-reducible bridge bonds in G'. 

Definition 42. The endpoint of a bond such that has no attached de- 
scendants, i.e. so that L'y^ = {}, is hanging. A pair of hanging bonds e and 
e' such that e has an endpoint whose most recent attached ancestor is incident 
with e', form a pair of ping-pong bonds. We call e the ping bond and e' the pong 
bond (Figure [6];E)). 

Definition 43. For G =^ G', G' is G-bounded if it does not contain a G-reducible 
non-minimal element and its G-unbridged graph does not contain a G-reducible 
ping bond. 
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Theorem 2. The G-bounded AVGs contain the G-optimal AVGs for every cost 
function. 

Proof. See Supplementary Appendix B. □ 

Importantly, the following theorem demonstrates that there is a constant k 
such that any G-bounded history graph is at most k times the cardinality of G. 

Theorem 3. A G-bounded history graph contains less than or equal to max(0, lOn— 
8) G-reducible bonds and max(0, 2m — 2, 20n — 16, 20n + 2m — 18) additional 
segments, where n is the number of bonds in G and m is the number of labeled 
segments in G. This bound is tight for all values of n and m. 

The graphs in G-bounded are therefore only exponential, not infinite, in 
number. 

Proof. See Supplementary Appendix C. □ 

13 The G-bounded Poset 

We now define an ordering between G-bounded history graphs, by defining a 
characteristic set of operations that allow navigation between them. 

Definition 44. For a segment a; in a G-bounded history graph, the composite 
minimisation of x is as follows: 

• If x is unattached and unlabeled and has a G-reducible parent branch, the 
contraction of the parent branch. 

• If a; is then an unattached, unlabeled root and has a single G-reducible 
child branch, the contraction of the child branch. 

• The deletion of x if subsequently isolated, unlabeled and G-reducible. 

Definition 45. In a G-bounded extension, for a G-reducible label of a seg- 
ment X, a label detachment is the deletion of the label of x and then composite 
minimisation of x (Figure [7][A-C)). 

Definition 46. In a G-bounded extension, for a G-reducible bond {xq,,?/^}, a 
bond detachment is the deletion of {xa,y/3} and then composite minimisation 
of X and y (Figure [tJ^D-F)). 

Definition 47. In a G-bounded extension, for a pair of G-reducible junction 
bonds {xa,yp} and {wq,,z^}, such that Wa = A{xa) and = ^(y^) a lateral- 
bond detachment is the bond detachment of both {xa,yp} and {wq,z^} and 
subsequent bond attachment creating {vj,Xa}, where v is newly created or an 
existing segment whose side was previously unattached (Figure IWD-E)). 
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(A) (B) c;^> (C) 




Figure 7: A sequence of G-bounded extension operations that convert the graph 
in (A) into the AVG in (F). 



Definition 48. A G-bounded reduction operation on a G-bounded history graph 
is a label/bond/lateral-bond detachment operation that results in a G-bounded 
history graph. As with reduction operations, the inverse of an G-bounded re- 
duction operation is a G-bounded extension operation. 

Definition 49. A G-bounded history graph G' is a G-bounded reduction (resp. 
extension) of another G-bounded history graph G" if G' is isomorphic to a 
graph that can be obtained from G" by a sequence of G-bounded reduction 
(resp. extension) operations. 

Lemma 6. The G-bounded reduction relation is a partial order. 

Proof. Easily verified. □ 

Definition 50. The G-bounded poset is the set of G-bounded history graphs 
with the G-bounded reduction relation. We write -<g to denote the G-bounded 
reduction relation and ^ -g to denote its covering relation (transitive reduction). 

We characterise the G-bounded poset in the following theorem. 

Theorem 4. The G-bounded poset is finite, has a single least element G, its 
set of maximal elements are AVGs, and if and only if there exists a G-bounded 
reduction operation to transform G" into G' then G' -< -gG" . 

Proof. See Supplementary Appendix D. □ 

As the G-bounded poset is finite, it can be represented by a Hasse diagram 
whose nodes are the G-bounded history graphs and whose edges, which are the 
covering relation, represent equivalence classes of G-bounded operations. Figure 
[8] shows a simple G-bounded poset Hasse diagram. 
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(A) 



o — ^ 



I 



(B) 



(C) 



(D) 



O — ^ 



(E) 



5_N Aj 



(F) > ; 



Figure 8: A Hasse diagram of the G-bounded poset for an example history 
graph. 



14 A Basic Implementation 

The previous four theorems estabhsh the mechanics of everything we need to 
sample, crudely, in finite time the set of G-optimal AVGs, and thus, amongst 
other things, determine the cost of a history graph. Although it will require 
considerable further work to establish practical, efficient sampling algorithms, 
we have implemented a simple graph library in Python that for an input his- 
tory graph G can sample G-bounded AVGs (https : //github . com/dzerbino/] 
|pyAVG) by generating sequences of G-bounded extension operations. 

To test the library we used simulations. For each simulation we first gener- 
ated an evolutionary history H by forward simulation, starting from a genome 
with 5 segments in a single thread and simulating through 4 epochs in which 
either whole chromosome replication or rearrangements occurred and substitu- 
tions were made at a constant rate at each branch. The alphabet of labels in the 
simulation had cardinality four, i.e. as in the case with single base labels. To 
ensure we considered interesting histories we considered only those with both 
substitutions and rearrangements that had been through at least two epochs of 
replication. 

To create a reduction G of H we removed from H all labels of internal seg- 
ments and bonds connecting internal segments and then contracted the parent 
branch of all internal segments, so that the beginning history contained only the 
leaf threads and branch trees that, containing no internal segments, simply rep- 
resented the homologies between the segments. To represent further potential 
uncertainty about the leaf threads we then randomly removed, on average, 10% 
of the bonds, labels and segments from these threads. From G we then sampled 
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G-bounded AVGs, each created by iteratively sampling G-bounded extension 
operations starting from G, picking each G-bounded extension operation in the 
sequence at random from those possible. Figure [9] shows the upper and lower 
bound rearrangement cost of members of the sequences of G-bounded exten- 
sions, until we reached zero ambiguity, for each of 2,000 starts. We observe that 
the most parsimonious AVGs sampled involve relatively few extension opera- 
tions, while the least parsimonious generally have more. 

To investigate a more efficient search strategy, in a subsequent experiment 
we restarted the search if we reached an extension with a higher total sum of 
lower bound substitution and rearrangement costs than Stj(G) -|-r„(G), initially, 
and then subsequently the sum of the substitution and rearrangement costs of 
the the best AVG found up to that point. We sampled 20,000 starts/restarts for 
each of 20 randomly sampled pairs (H, G). For each graph G, we recorded the 
substitution and rearrangement costs s(H) and r(H) of the evolutionary history 
from which G is derived, the ambiguity, lower cost bound, and upper cost bound 
for substitutions in G {us{G), si{G), s„(G), resp.), as well as for rearrangements 
{ur{G), ri{G), ru{G), resp.), along with the minimum and maximum substitu- 
tion (and rearrangement) costs for the G-bounded AVG extensions found by 
sampling, denoted s{Hsmin), s{Hsmax), and r(i?rmm), r{Hrmax), resp. Ta- 
bles [T] and [2] show the results of these 20 sampling runs. For these simulations 
r{Hrmin) IS oftcu closc Or cqual to ri{G), while r{Hrmax) is generally slightly 
greater than r„(G). Notably, we found that AVG extensions sometimes had 
lower cost than the original evolutionary history, this occurring because of the 
information loss that resulted from reducing H to G. Figure 10 shows one ex- 
ample of H, G and a sampled AVG that is an example of Hrmin and Hsmin 
such that r(H) = ri{G) = r{H„ni7i) = 2 (i.e. a minimum possible rearrange- 
ment cost extension), and s(H) = 3 but si{G) — s{Hrmin) = 1 (i-e. also a 
minimum possible substitution cost extension, and with lower substitution cost 
than the original history). The G-bounded extension sequence from G to this 
AVG involved the creation of just 7 bonds, 5 segments and 7 labels. 

Repeating these experiments with histories that started with 10 root seg- 
ments in the evolutionary history, but which were otherwise simulated identi- 
cally, demonstrates that the naive random search procedure implemented here 
fails to find reasonable histories within a set of only 20,000 random samples 
(data not shown), so, as might be expected, more intelligent sampling strategies 
will be needed to find parsimonious interpretations of even moderately complex 
datasets. 



15 Discussion 

We have introduced a general parsimony model for genome evolution, though 
not without some seemingly arbitrary choices with respect to the particular 
reduction relation and the definition of the set G-bounded. In this discussion 
we highlight the reasons for our choice of reduction relation, how reduction 
relates to other orderings over graphs, and how we can easily approximate a 
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Figure 9: Top main panel: The UBRC vs. the number of G-bounded exten- 
sion operations performed for sampled G-bounded extensions. Bottom main 
panel: As top, but showing LBRC instead of UBRC. The contour lines show 
the number of extensions. The two labeled red paths show the cost bounds for 
individual extension sequences, one which results in a minimum rearrangement 
cost extension, and one which results in a maximum amongst those sampled 
rearrangement cost extension. 
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Table 1: Results for substitution ambiguity and cost. Each row represents a 
separate initial evolutionary history. 
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Figure 10: History graphs each shown with the addition of free roots and non- 
trivial Hfted bonds. (A) H, (B) G, (C) An example of Hrmin and Hgrnin- 
Example corresponds to experiment 1 in Tables [T] and [2j Segments are repre- 
sented as circles, bonds as black arcs, with sides indicated by the orientations 
of the arrow heads at the ends of the arcs, branches are the dotted arrows, the 
colour depending on if their most recent labeled descendant generates a trivial 
or non-trivial lifted label. Non-trivial lifted bonds are shown in red. The free 
roots are shown as grey circles. Colours are used as labels. 
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Table 2: Follows format of Table [D 



set of G-reducible elements, something critical to the sampling of G-bounded 
extensions of a given graph. We then briefly discuss the possibilities of yet more 
compact graphical representations. 

In the reduction relation, we allow the deletion of segments, segment la- 
bels and bonds, but we forbid branch deletion. This is because in the opposite 
direction, used as an extension, it would allow the invention of homology be- 
tween segments (see Figure [Tl| A)). Unlike branches, we forbid the contraction 
of bonds because in the opposite direction it would allow interstitial segments 
to be created without any rearrangements (see Figure [Il|(B)). 

We disallow the non-trivial contraction of the incoming branch of attached 
and labeled segments, with the one exception for branches with free-parents, 
because it would allow previously separate threads to merge in a reduction (see 
Figure [TTJC)), and because segments could be reduced to become ancestors 
of originally indirectly related segments (see Figure [TiJD)). We allow the one 
exception for the contraction of the incoming branch of attached or labeled 
segments when the branch has a free-parent because disallowing it would forbid 
reductions that removed information from root segments (see Figure [Ti| E)) and 
allowing it does not permit the issues highlighted in Figures ll^C-D). 

It is informative to consider the relationship between reduction operations 
and the reduction relation. When a graph contains multiple copies of isomorphic 
structures, distinct reduction operations can result in isomorphic reductions (see 
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Figure [TiJF-I)), therefore each possible reduction in the covering set (transitive 
reduction) of the reduction relation represents an equivalence class of reduction 
operations. 

Definition 51. A valid permutation of a reduction sequence is a permutation in 
which all operations remain reduction operations when performed in sequence. 

Clearly not all permutations of a reduction sequence have this property, 
however the following lemma illustrates the relationship between valid permu- 
tations. 

Lemma 7. All valid permutations of a reduction sequence create isomorphic 
reductions. 

Proof. Easily verified. □ 
Reduction is somewhat analogous to a restricted form of the graph minor. 



Importantly, the graph minor is a well-quasi-ordering (WQO) (Bienstock and 
Langston 1994| ), i.e. in any infinite set of graphs there exists a pair such that 



one is the minor of the other. 
Lemma 8. Reduction is not a WQO. 

Proof. Consider the infinite set of cyclic threads, they are not reductions of one 
another. □ 

An ordering is a WQO if and only if every set has a finite subset of minimal 
elements. In contrast, it can be shown that for the reduction relation, even 
the set of AVG extensions of a single base history G can have an infinite set of 
minimal elements. 

Lemma 9. There exists a history graph G with an infinite number of G -minimal 
extensions. 

Proof. See Supplementary Appendix E. □ 

One barrier to exploring the G-bounded poset is deciding for a pair of history 
graphs G and G' such that G =^ G" if an element is G-reducible. This problem 
is of unknown complexity, and may well be NP-hard. To avoid the potential 
complexity of this problem we can define an alternative notion of reducibility. 

Definition 52. A fix for {G,G'), where G =^ G', is a history subgraph of 
{Vc , Eg' , Bq,) isomorphic to G, where B^, is the transitive closure of Bq'. 

Starting from an input history graph G and a fix isomorphic to it, we can 
easily update the fix as we create extensions of G. For an extension of G, 
elements in the fix become the equivalent of G-irreducible, while elements not 
in the fix become the equivalent of G-reducible. From a starting graph we can 
therefore explore a completely analogous version of G-bounded, replacing the 
question of G-reducibilty with membership of the fix. 
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Figure 11: (A,B,C,D) The graphs on the left side are not reductions of the 
graphs on the right. (E) The graph on the left is a reduction of the graph on 
the right. (F,G,H,I) Examples of equivalence classes of reduction operations, 
where multiple distinct reduction operations result in the same reduction. 



Following from Lemma [7j there is a bijection between the set of fixes for 
G ^ G' and the set of equivalence classes of reduction sequences that are all valid 
permutations of each other. This is the limitation of considering membership of 
a fix instead of assessing if an element is G-reducible, it limits us to considering 
only a single equivalence class of reduction sequences in exploring the analogous 
poset to G-bounded. 

It is in general possible to reduce the size of the set G-bounded while still 
maintaining the properties that it can be efficiently sampled and contains G- 
optimal. However, this is likely to be at the expense of making the definition 
of G-bounded more complex. One approach is to add further "forbidden con- 
figurations" to the definition of G-bounded, like the G-reducible pings bonds 
that are forbidden in the current definition of G-bounded. Forbidding these was 
essential to making G-bounded finite, but we might consider also forbidding 
other configurations just to make G-bounded smaller. 

Finally, it is possible to consider a graph representation of histories that use 
fewer segment nodes if we are willing to allow for the possibility that a subrange 
of the sequence of a segment be ancestral to a subrange of the sequence of 
another segment. This is a common approach in ancestral recombination graphs 
( Song and Hein 2005 ) . Such a representation entails the additional complexity 
of needing to specify the sequence subranges for every branch, but may in some 
applications be a worthwhile trade off for reducing the number of segments in 
the graph. The theory of such graphs is mathematically equivalent to the theory 
of the history graphs presented here, but the implementation would differ. 
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16 Conclusion 



We have introduced a graph-based parsimony model in which a set of chro- 
mosomes evolves via the processes of whole chromosome replication, gain and 
loss, substitution and DCJ rearrangements. We have demonstrated upper and 
lower bounds on parsimony cost that are trivial to compute. Though these 
cost bounding functions are relatively simple and can almost certainly be tight- 
ened for many cases, importantly, despite their simplicity and the underlying 
intractability of the problem, they become tight for AVGs. This implies that 
we only need to reach AVG extensions to assess cost when sampling extensions. 

This is the first, to our knowledge, fully general parsimony model of chro- 
mosome evolution to be proposed. However, it is still limited in that it costs 
rearrangements and substitutions as crude sums over inferred events, and costs 
all rearrangements including recombinations as a function of the number of 
inferred breakends. In the future we therefore anticipate extensions that incor- 
porate more complex cost functions, as well as models that use a probabilistic 
framework in place of a parsimony framework. 

The constructive definition of the G-bounded poset, coupled with the upper 
and lower bound functions, provides a path towards simple branch and bound 
based sampling algorithms for exploring low-cost genome histories. Again, this is 
the first description of such an ordering over genome histories, and though more 
complex sampling strategies than those introduced here will likely need to be 
built upon the G-bounded poset, this should facilitate the practical exploration 
of the space of optimal and near optimal genome histories in this general model. 

In related work we consider the problem of sampling genome histories when 



the input genomes are also potentially unphased (Zerbino et al. I, as is typical 



with current generation sequencing data, using a linear algebra framework that 
is a natural fit to describe mixtures of possible genomes and evolutionary opera- 
tions upon them. In the graph model of this paper, this is somewhat analogous 
to moving from requiring segments to be part of threads to being part of more 
general sequence graphs in which there is balance in the number of incidences 
with the two sides of each segment. This simplifies aspects of the problem while 
complicating other aspects, simplifying it in that homologous recombination op- 
erations turn out to be invisible, but complicating it in requiring more complex 
accounting of underlying possible histories. 
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